88 research outputs found

    PiSDF: Parameterized & Interfaced Synchronous Dataflow for MPSoCs Runtime Reconfiguration

    Get PDF
    International audienceDataflow models of computation are widely used for the specification, analysis, and optimization of Digital Signal Processing (DSP) applications. In this talk, we present the Parameterized and Interfaced Synchronous Dataflow (πSDF) model that addresses the important challenge of managing dynamics in DSP-oriented representations. In addition to cap-turing application parallelism, which is an intrinsic feature of dataflow models, πSDF enables the specification of hierarchical and reconfigurable applications. The Synchronous Parameterized and Interfaced Dataflow Embedded Runtime (SPIDER) is also presented to support the execution of πSDF specifications on heterogeneous Multiprocessor Systems-on-Chips (MPSoCs)

    Numerical Representation of Directed Acyclic Graphs for Efficient Dataflow Embedded Resource Allocation

    Get PDF
    International audienceStream processing applications running on Heterogeneous Multi-Processor Systems on Chips (HMPSoCs) require efficient resource allocation and management, both at compile-time and at runtime. To cope with modern adaptive applications whose behavior can not be exhaustively predicted at compile-time, runtime managers must be able to take resource allocation decisions on-the-fly, with a minimum overhead on application performance. Resource allocation algorithms often rely on an internal modeling of an application. Directed Acyclic Graph (DAGs) are the most commonly used models for capturing control and data dependencies between tasks. DAGs are notably often used as an intermediate representation for deploying applications modeled with a dataflow Model of Computation (MoC) on HMPSoCs. Building such intermediate representation at runtime for massively parallel applications is costly both in terms of computation and memory overhead. In this paper, an intermediate representation of DAGs for resource allocation is presented. This new representation shows improved performance for run-time analysis of dataflow graphs with less overhead in both computation time and memory footprint. The performances of the proposed representation are evaluated on a set of computer vision and machine learning applications

    Memory Bounds for the Distributed Execution of a Hierarchical Synchronous Data-Flow Graph

    Get PDF
    International audienceThis paper presents an application analysis technique to define the boundary of shared memory requirements of Multiprocessor System-on-Chip (MPSoC) in early stages of development. This technique is part of a rapid prototyping process and is based on the analysis of a hierarchical Synchronous Data-Flow (SDF) graph description of the system application. The analysis does not require any knowledge of the system architecture, the mapping or the scheduling of the system application tasks. The initial step of the method consists of applying a set of transformations to the SDF graph so as to reveal its memory characteristics. These transformations produce a weighted graph that represents the different memory objects of the application as well as the memory allocation constraints due to their relationships. The memory boundaries are then derived from this weighted graph using analogous graph theory problems, in particular the Maximum-Weight Clique (MWC) problem. Stateof-the-art algorithms to solve these problems are presented and a heuristic approach is proposed to provide a near-optimal solution of the MWC problem. A performance evaluation of the heuristic approach is presented, and is based on hierarchical SDF graphs of realistic applications. This evaluation shows the efficiency of proposed heuristic approach in finding near optimal solutions

    Pre- and post-scheduling memory allocation strategies on MPSoCs

    Get PDF
    6 pagesInternational audienceThis paper introduces and assesses a new method to allocate memory for applications implemented on a shared memory Multiprocessor System-on-Chip (MPSoC). This method first consists of deriving, from a Synchronous Dataflow (SDF) algorithm description, a Memory Exclusion Graph (MEG) that models all the memory objects of the application and their allocation constraints. Based on the MEG, memory allocation can be performed at three different stages of the implementation process: prior to the scheduling process, after an untimed multicore schedule is decided, or after a timed multicore schedule is decided. Each of these three alternatives offers a distinct trade-off between the amount of allocated memory and the flexibility of the application multicore execution. Tested use cases are based on descriptions of real applications and a set of random SDF graphs generated with the SDF For Free (SDF3) tool. Experimental results compare several allocation heuristics at the three implementation stages. They show that allocating memory after an untimed schedule of the application has been decided offers a reduced memory footprint as well as a flexible multicore execution

    Memory Bounds for the Distributed Execution of a Hierarchical Synchronous Data-Flow Graph

    Get PDF
    International audienceThis paper presents an application analysis technique to define the boundary of shared memory requirements of Multiprocessor System-on-Chip (MPSoC) in early stages of development. This technique is part of a rapid prototyping process and is based on the analysis of a hierarchical Synchronous Data-Flow (SDF) graph description of the system application. The analysis does not require any knowledge of the system architecture, the mapping or the scheduling of the system application tasks. The initial step of the method consists of applying a set of transformations to the SDF graph so as to reveal its memory characteristics. These transformations produce a weighted graph that represents the different memory objects of the application as well as the memory allocation constraints due to their relationships. The memory boundaries are then derived from this weighted graph using analogous graph theory problems, in particular the Maximum-Weight Clique (MWC) problem. Stateof-the-art algorithms to solve these problems are presented and a heuristic approach is proposed to provide a near-optimal solution of the MWC problem. A performance evaluation of the heuristic approach is presented, and is based on hierarchical SDF graphs of realistic applications. This evaluation shows the efficiency of proposed heuristic approach in finding near optimal solutions

    PREESM: A Dataflow-Based Rapid Prototyping Framework for Simplifying Multicore DSP Programming

    Get PDF
    International audienceThe high performance Digital Signal Processors (DSP) currently manufactured by Texas Instruments are heterogeneous multiprocessor architectures. Programming these architectures is a complex task often reserved to specialized engineers because the bottlenecks of both the algorithm and the architecture need to be deeply understood in order to obtain a fairly parallel execution. The PREESM framework objective is to simplify the programming of multicore DSP systems by building on dataflow programming methods. The current functionalities of this scalable framework cover memory and time analysis, as well as automatic deadlock-free code generation. Several tutorials are provided with the tool for fast initiation of C programmers to multicore DSP programming. This paper demonstrates PREESM capabilities by comparing simulation and execution performances on a stereo matching algorithm prototyped on the TMS320C6678 8-core DSP device

    Models of Architecture: Reproducible Efficiency Evaluation for Signal Processing Systems

    Get PDF
    International audienceThe current trend in high performance and embedded signal processing consists of designing increasingly complex heterogeneous hardware architectures with non-uniform communication resources. In order to take hardware and software design decisions, early evaluations of the system non-functional properties are needed. These evaluations of system efficiency require high-level information on both the algorithms and the architecture. In this paper, we define the notion of Model of Architecture (MoA) and study the combination of a Model of Computation (MoC) and an MoA to provide a design space exploration environment for the study of the algorithmic and architectural choices. A cost is computed from the mapping of an application, represented by a model conforming a MoC onto an architecture represented by a model conforming an MoA. The cost is composed of a processing-related part and a communication-related part. It is an abstract scalar value to be minimized and can represent any non-functional requirement of a system such as memory, energy, throughput or latency

    Models of Architecture: Reproducible Efficiency Evaluation for Signal Processing Systems

    Get PDF
    International audienceThe current trend in high performance and embedded signal processing consists of designing increasingly complex heterogeneous hardware architectures with non-uniform communication resources. In order to take hardware and software design decisions, early evaluations of the system non-functional properties are needed. These evaluations of system efficiency require high-level information on both the algorithms and the architecture. In this paper, we define the notion of Model of Architecture (MoA) and study the combination of a Model of Computation (MoC) and an MoA to provide a design space exploration environment for the study of the algorithmic and architectural choices. A cost is computed from the mapping of an application, represented by a model conforming a MoC onto an architecture represented by a model conforming an MoA. The cost is composed of a processing-related part and a communication-related part. It is an abstract scalar value to be minimized and can represent any non-functional requirement of a system such as memory, energy, throughput or latency

    Models of Architecture

    Get PDF
    The current trend in high performance and embedded computing consists of designing increasingly complex heterogeneous hardware architectures with non-uniform communication resources. In order to take hardware and software design decisions, early evaluations of the system non-functional properties are needed. These evaluations of system efficiency require high-level information on both the algorithms and the architecture. In state of the art Model Driven Engineering (MDE) methods, different communities have developed custom architecture models associated to languages of substantial complexity. This fact contrasts with Models of Computation (MoCs) that provide abstract representations of an algorithm behavior as well as tool interoperability.In this report, we define the notion of Model of Architecture (MoA) and study the combination of a MoC and an MoA to provide a design space exploration environment for the study of the algorithmic and architectural choices. An MoA provides reproducible cost computation for evaluating the efficiency of a system. A new MoA called Linear System-Level Architecture Model (LSLA) is introduced and compared to state of the art models. LSLA aims at representing hardware efficiency with a linear model. The computed cost results from the mapping of an application, represented by a model conforming a MoC on an architecture represented by a model conforming an MoA. The cost is composed of a processing-related part and a communication-related part. It is an abstract scalar value to be minimized and can represent any non-functional requirement of a system such as memory, energy, throughput or latency

    Etude mémoire et représentations flux de données pour le prototypage rapide d'applications de traitement du signal sur MPSoCs

    No full text
    The development of embedded Digital Signal Processing (DSP) applications for Multiprocessor Systems-on-Chips (MPSoCs) is a complex task requiring the consideration of many constraints including real-time requirements, power consumption restrictions, and limited hardware resources. To satisfy these constraints, it is critical to understand the general characteristics of a given application: its behavior and its requirements in terms of MPSoC resources. In particular, the memory requirements of an application strongly impact the quality and performance of an embedded system, as the silicon area occupied by the memory can be as large as 80% of a chip and may be responsible for a major part of its power consumption. Despite the large overhead, limited memory resources remain an important constraint that considerably increases the development time of embedded systems. Dataflow Models of Computation (MoCs) are widely used for the specification, analysis, and optimization of DSP applications. The popularity of dataflow MoCs is due to their great analyzability and their natural expressivity of the parallelism of a DSP application. The abstraction of time in dataflow MoCs is particularly suitable for exploiting the parallelism offered by heterogeneous MPSoCs. In this thesis, we propose a complete method to study the important aspect of memory characteristic of a DSP application modeled with a dataflow graph. The proposed method spans the theoretical, architecture-independent memory characterization to the quasi-optimal static memory allocation of an application on a real shared-memory MPSoC. The proposed method, implemented as part of a rapid prototyping framework, is extensively tested on a set of state-of-the-art applications from the computer-vision, the telecommunication, and the multimedia domains. Then, because the dataflow MoC used in our method is unable to model applications with a dynamic behavior, we introduce a new dataflow meta-model to address the important challenge of managing dynamics in DSP-oriented representations. The new reconfigurable and composable dataflow meta-model strengthens the predictability, the conciseness and the readability of application descriptions.Le développement d’applications de traitement du signal pour des architectures multi-coeurs embarquées est une tâche complexe qui nécessite la prise en compte de nombreuses contraintes. Parmi ces contraintes figurent les contraintes temps réel, les limitations énergétiques, ou encore la quantité limitée des ressources matérielles disponibles. Pour satisfaire ces contraintes, une connaissance précise des caractéristiques des applications à implémenter est nécessaire. La caractérisation des besoins en mémoire d’une application est primordiale car cette propriété a un impact important sur la qualité et les performances finales du système développé. En effet, les composants de mémoire d’un système embarqué peuvent occuper jusqu’à 80% de la surface totale de silicium et être responsable d’une majeure partie de la consommation énergétique. Malgré cela, les limitations mémoires restent une contrainte forte augmentant considérablement les temps de développements. Les modèles de calcul de type flux de données sont couramment utilisés pour la spécification, l’analyse et l’optimisation d’applications de traitement du signal. La popularité de ces modèles est due à leur bonne analysabilité ainsi qu’à leur prédisposition à exprimer le parallélisme des applications. L’abstraction de toute notion de temps dans les diagrammes flux de données facilite l’exploitation du parallélisme offert par les architectures multi-coeurs hétérogènes. Dans cette thèse, nous présentons une méthode complète pour l’étude des caractéristiques mémoires d’applications de traitement du signal modélisées par des diagrammes flux de données. La méthode proposée couvre la caractérisation théorique d’applications, indépendamment des architectures ciblées, jusqu’à l’allocation quasi-optimale de ces applications en mémoire partagée d’architectures multi-coeurs embarquées. L’implémentation de cette méthode au sein d’un outil de prototypage rapide permet son évaluation sur des applications récentes de vision par ordinateur, de télécommunication, et de multimédia. Certaines applications de traitement du signal au comportement très dynamique ne pouvant être modélisé par le modèle de calcul supporté par notre méthode, nous proposons un nouveau méta-modèle de type flux de données répondant à ce besoin. Ce nouveau méta-modèle permet la modélisation d’applications reconfigurables et modulaires tout en préservant la prédictibilité, la concision et la lisibilité des diagrammes de flux de données
    • …
    corecore